Finding Domain Terms using Wikipedia

نویسندگان

Jorge Vivaldi

Horacio Rodríguez

چکیده

In this paper we present a new approach for obtaining the terminology of a given domain using the category and page structures of the Wikipedia in a language independent way. The idea is to take profit of category graph of Wikipedia starting with a top category that we identify with the name of the domain. After obtaining the full set of categories belonging to the selected domain, the collection of corresponding pages is extracted, using some constraints. For reducing noise a bootstrapping approach implying several iterations is used. At each iteration less reliable pages, according to the balance between on-domain and off-domain categories of the page, are removed as well as less reliable categories. The set of recovered pages and categories is selected as initial domain term vocabulary. This approach has been applied to three broad coverage domains: astronomy, chemistry and medicine, and two languages: English and Spanish, showing a promising performance. The resulting set of terms has been evaluated using as reference those terms occurring in WordNet (using Magnini's domain codes) and those appearing in SNOMED-CT (a reference resource for the Medical domain available for Spanish).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Wikipedia for Domain Terms Extraction

Domain terms are a useful resource for tuning both resources and NLP processors to domain specific tasks. This paper proposes a method for obtaining terms from potentially any domain using Wikipedia.

متن کامل

Using Domain-specific and Collaborative Resources for Term Translation

In this article we investigate the translation of terms from English into German and vice versa in the isolation of an ontology vocabulary. For this study we built new domainspecific resources from the translation search engine Linguee and from the online encyclopedia Wikipedia. We learned that a domainspecific resource produces better results than a bigger, but more general one. The first find...

متن کامل

Harvesting Domain-Specific Terms using Wikipedia

We present a simple but effective method of automatically extracting domain-specific terms using Wikipedia as training data (i.e. self-supervised learning). Our first goal is to show, using human judgments, that Wikipedia categories are domainspecific and thus can replace manually annotated terms. Second, we show that identifying such terms using harvested Wikipedia categories and entities as s...

متن کامل

Computing Semantic Relatedness using DBPedia

Extracting the semantic relatedness of terms is an important topic in several areas, including data mining, information retrieval and web recommendation. This paper presents an approach for computing the semantic relatedness of terms using the knowledge base of DBpedia — a community effort to extract structured information from Wikipedia. Several approaches to extract semantic relatedness from ...

متن کامل

Using Wikipedia to translate domain-specific terms in SMT

When building a university lecture translation system, one important step is to adapt it to the target domain. One problem in this adaptation task is to acquire translations for domain specific terms. In this approach we tried to get these translations from Wikipedia, which provides articles on very specific topics in many different languages. To extract translations for the domain specific ter...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Finding Domain Terms using Wikipedia

نویسندگان

چکیده

منابع مشابه

Using Wikipedia for Domain Terms Extraction

Using Domain-specific and Collaborative Resources for Term Translation

Harvesting Domain-Specific Terms using Wikipedia

Computing Semantic Relatedness using DBPedia

Using Wikipedia to translate domain-specific terms in SMT

عنوان ژورنال:

اشتراک گذاری